generalizable singular value decomposition
Generalizable Singular Value Decomposition for Ill-posed Datasets
We demonstrate that statistical analysis of ill-posed data sets is subject to a bias, which can be observed when projecting indepen(cid:173) dent test set examples onto a basis defined by the training exam(cid:173) ples. Because the training examples in an ill-posed data set do not fully span the signal space the observed training set variances in each basis vector will be too high compared to the average vari(cid:173) ance of the test set projections onto the same basis vectors. On basis of this understanding we introduce the Generalizable Singu(cid:173) lar Value Decomposition (GenSVD) as a means to reduce this bias by re-estimation of the singular values obtained in a conventional Singular Value Decomposition, allowing for a generalization perfor(cid:173) mance increase of a subsequent statistical model. We demonstrate that the algorithm succesfully corrects bias in a data set from a functional PET activation study of the human brain. An ill-posed data set has more dimensions in each example than there are examples.
Generalizable Singular Value Decomposition for Ill-posed Datasets
Kjems, Ulrik, Hansen, Lars Kai, Strother, Stephen C.
Becausethe training examples in an ill-posed data set do not fully span the signal space the observed training set variances in each basis vector will be too high compared to the average variance ofthe test set projections onto the same basis vectors. On basis of this understanding we introduce the Generalizable Singular ValueDecomposition (GenSVD) as a means to reduce this bias by re-estimation of the singular values obtained in a conventional Singular Value Decomposition, allowing for a generalization performance increaseof a subsequent statistical model. We demonstrate that the algorithm succesfully corrects bias in a data set from a functional PET activation study of the human brain. 1 Ill-posed Data Sets An ill-posed data set has more dimensions in each example than there are examples. Such data sets occur in many fields of research typically in connection with image measurements. The associated statistical problem is that of extracting structure from the observed high-dimensional vectors in the presence of noise. The statistical analysis can be done either supervised (Le.